An HMM-Based Mandarin Chinese Text-To-Speech System
نویسندگان
چکیده
In this paper we present our Hidden Markov Model (HMM)-based, Mandarin Chinese Text-to-Speech (TTS) system. Mandarin Chinese or Putonghua, “the common spoken language”, is a tone language where each of the 400 plus base syllables can have up to 5 different lexical tone patterns. Their segmental and supra-segmental information is first modeled by 3 corresponding HMMs, including: (1) spectral envelop and gain; (2) voiced/unvoiced and fundamental frequency; and (3) segment duration. The corresponding HMMs are trained from a read speech database of 1,000 sentences recorded by a female speaker. Specifically, the spectral information is derived from short-time LPC spectral analysis. Among all LPC parameters, Line Spectrum Pair (LSP) has the closest relevance to the natural resonances or the “formants” of a speech sound and it is selected to parameterize the spectral information. Furthermore, the property of clustered LSPs around a spectral peak justify augmenting LSPs with their dynamic counterparts, both in time and frequency, in both HMM modeling and parameter trajectory synthesis. One hundred sentences synthesized by 4 LSP-based systems have been subjectively evaluated with an AB comparison test. The listening test results show that LSP and its dynamic counterpart, both in time and frequency, are preferred for the resultant higher synthesized speech quality.
منابع مشابه
Variable Speech Rate Mandarin Chinese Text-to-Speech System
This paper presents an Hidden Markov Model (HMM)-based variable speech rate Mandarin Chinese text-to-speech (TTS) system. In this system, parameters of spectrum, fundametal frequency and state duration are generated by a context dependent HMM (CDHMM) whose model parameters are linear-interpolated from those of three CDHMMs trained by corpora in three different speech rates (SRs), i.e. fast, med...
متن کاملTangerine: a large vocabulary Mandarin dictation system
The text input for non-alphabetic languages, such as Chinese, has been a decades-long problem. Chinese Dictation using large vocabulary speech recognition provides a convenient mode of text entry. In contrast to a character based Dictation system [5], a word-based Mandarin dictation system has been designed [3] (based on Apple's PlainTalk speech recognition technology [4]) for efficient entry o...
متن کاملIncorporating Pitch Features for Tone Modeling in Automatic Recognition of Mandarin Chinese
Tone plays a fundamental role in Mandarin Chinese, as it plays a lexical role in determining the meanings of words in spoken Mandarin. For example, these two sentences R R (I like horses) and R M (I like to scold) differ only in the tone carried by the last syllable. Thus, the inclusion of tone-related information through analysis of pitch data should improve the performance of automatic speech...
متن کاملSyllable HMM based Mandarin TTS and comparison with concatenative TTS
This paper introduces a Syllable HMM based Mandarin TTS system. 10-state left-to-right HMMs are used to model each syllable. We leverage the corpus and the front end of a concatenative TTS system to build the Syllable HMM based TTS system. Furthermore, we utilize the unique consonant/vowel structure of Mandarin syllable to improve the voiced/unvoiced decision of HMM states. Evaluation results s...
متن کامل基於發音知識以建構頻譜HMM 之國語語音合成方法 (A Mandarin Speech Synthesis Method Using Articulation-knowledge Based Spectral HMM Structure)[In Chinese]
In this paper, a new HMM structure is proposed to work with a limited training corpus in order to obtain improved synthetic-speech fluency. Spectral fluency is improved because this HMM structure can model the context-dependent spectral characteristics of a speech unit. In addition, instead of using a decision tree to cluster contexts, the knowledge of phoneme articulation is based to cluster c...
متن کامل